home *** CD-ROM | disk | FTP | other *** search
-
-
- Engineers concerned about performance,
-
- Performance is the buzz-word now at SGI. The question has been asked
- "Where can you find bloat?" The joke answer is "Anywhere you look."
- It's funny for about one second.
-
- There real question is how do you locate the bloat itself. A program's
- start up time or its memory usage is just a symptom; you've got to find
- the causes if you want to find the cures.
-
- This posting describes how to find and hunt down program bloat in X
- Window System clients. The hope is that this information will arm
- individual engineers with the tools they need to first find and then
- eliminate the bloat in their X applications.
-
- The primary tool used for determining how your application is using X
- is "xscope" found in the x_eoe.sw.Xapps subsystem. xscope is an X
- protocol trace gatherer. It acts as a pseudo-server between the X
- client and the X server. It watches X protocol can prints the decoded
- X protocol stream to stdout. The following diagram should explain how
- xscope works:
-
- /-----------\ /--------\ /--------\
- | | X requests | | X requests | |
- | arbitrary | ------------> | | ------------> | |
- | X client | <------------ | xscope | <------------ | Xsgi |
- | | X events, | | X events, | |
- | | errors, | | errors, | |
- \-----------/ replies \--------/ replies \--------/
- |
- |
- v
- decoded X
- protocol trace
-
- There is a man page on xscope but here is a brief example you can use
- to generate an X protocol trace:
-
- 1) start a shell
-
- 2) assuming you want to connect the machine foobar, set your
- DISPLAY environment variable to foobar:0
-
- 2) run "xscope |& tee XProtoTrace" (XXX csh command)
-
- 3) start a second shell
-
- 4) in that second shell, set your DISPLAY environment variable
- to foobar:1 (xscope will act to pass X protocol from foobar's
- display 1 to the real X server's display 0).
-
- 5) now start up your X client
-
- 6) once the X application is fully started, Control-C it
-
- 7) now Control-C the xscope
-
- 8) the file "XProtoTrace" now contains an X protocol trace for
- your client starting up
-
- Now you can examime the X protocol trace and understand what your
- program is doing at the X protocol level. It helps to be familar with
- the X protocol. To help you understand the X protocol, I recommend you
- having the O'Reilly "X Protocol Reference Manual (Volume 0)" nearby.
- In particular, Appendix I will help you map between X protocol requests
- and Xlib C routines that generate X protocol.
-
- Decoding X protocol is the window system equivalent of system call
- tracing for normal C programs. The mapping between Xlib and the X
- protocol is fairly straightforward so if you are familar with Xlib,
- seeing the protocol shouldn't be too daunting.
-
- The first thing to do is "vi XProtoTrace" and look through it.
-
- Here's the start of an X protocol trace from xlogo:
-
- ------------------------------------------------------------------------------------
- 0.00: Client --> 12 bytes
- byte-order: MSB first
- major-version: 000b
- minor-version: 0000
- 0.04: 520 bytes <-- X11 Server
- protocol-major-version: 000b
- protocol-minor-version: 0000
- release-number: 00001388
- resource-id-base: 0b800000
- resource-id-mask: 003fffff
- motion-buffer-size: 00000000
- image-byte-order: MSB first
- bitmap-format-bit-order: MSB first
- bitmap-format-scanline-unit: 20
- bitmap-format-scanline-pad: 20
- min-keycode: 10 (^J)
- max-keycode: 118 (v)
- vendor: "Silicon Graphics"
- pixmap-formats: (4)
- roots: (2)
- 0.05: Client --> 84 bytes
- ............REQUEST: CreateGC
- graphic-context-id: GXC 0b800000
- drawable: DWB 0000002a
- value-mask: background
- value-list:
- background: 00000007
- ............REQUEST: CreateGC
- graphic-context-id: GXC 0b800001
- drawable: DWB 0000002c
- value-mask: background
- value-list:
- background: 00000007
- ............REQUEST: QueryExtension
- name: "BIG-REQUESTS"
- ............REQUEST: GetProperty
- delete: False
- window: WIN 0000002a
- property: <RESOURCE_MANAGER>
- type: <STRING>
- long-offset: 00000000
- 0.10: 104 bytes <-- X11 Server
- ..............REPLY: QueryExtension
- present: False
- major-opcode: 00
- first-event: 00
- first-error: 00
- ..............REPLY: GetProperty
- format: 08
- type: <STRING>
- bytes-after: 00000000
- value: "xcalc*Command.ShapeStyle:^Irectangle ^J"
- ------------------------------------------------------------------------------------
-
- Notice the way the output says "Client -->" and "<-- X11 Server". That
- indicates who is saying what. Clients make requests. The X server
- returns replies (for those requests that return replies), events, and
- errors. CreateGC is a request without a reply. QueryExtension is a
- request that returns a reply.
-
- Browse through the output of an X protocol trace and get comfortable
- with the format.
-
- To get a high level statistical overview of what the X protocol trace
- contains, I've written a shell/awk script that does the trick. You can
- find a copy in this directory (~4Dgifts/toolbox/src/swtools/perf).
-
- To use the script say "xprotostats < XProtoTrace". Here is some sample
- output for iconbook's startup from the 5.1.2 alpha 60:
-
- ------------------------------------------------------------------------------------
- hoot:/usr/people/mjk#54 -> xprotostats < ICONBOOK
-
- RAW CLIENT PROTOCOL STATISITICS:
- bytes written: 41488
- batches written: 409
- requests made: 1850
- RAW SERVER PROTOCOL STATISITICS:
- bytes written: 74412
- batches written: 419
- replies made: 420
- events sent: 104
- errors sent: 0
- COMPUTED STATISTICS:
- replies/request: 22.70 %
- server/client data: 179.36 %
- bytes/request: 22.43 bytes
- OFTEN MISUSED ROUND-TRIP REQUESTS:
- AllocColor 42
- AllocNamedColor 8
- GetGeometry 30
- GetInputFocus 30
- GetKeyboardMapping 17
- GetSelectionOwner 2
- GetWindowAttributes 20
- InternAtom 49
- ListFonts 3
- ListProperties 3
- QueryBestSize 1
- QueryColors 170
- QueryExtension 3
- QueryFont 10
- QueryTree 2
- TranslateCoordinates 1
- OFTEN MISUSED EVENTS:
- NoExposure 31
-
- hoot:/usr/people/mjk#55 ->
- ------------------------------------------------------------------------------------
-
- So what does this mean? How does one target bloat using this output?
-
- Here are some observations I've made from the ouput:
-
- 1) Notice that the server writes 72.7 kilobytes to the client but
- the client only sends 40.5 kilobytes to the server. That's
- pretty lop-sided.
-
- Examining the protocol trace with an editor, there are a bunch of
- explanations:
-
- o Motif makes 17 GetKeyboardMapping requests. Each one returns
- the *same* 904 byte reply. There's 15 kilobytes right
- there. Only such call is necessary.
-
- o Examining the QueryFont requests, the "fixed" font is queried
- 3 times (once as "Fixed"; twice as "fixed"). Each query
- reply is 2.87 kilobytes.
-
- o There are lots of XQueryColors requests being done querying
- the color of a single pixel. It takes 12 bytes to issue the
- request but the reply is 40 bytes. Most of the calls are
- really unncessary and shift the IO burden to the X server.
- Probably about ~3 kilobytes.
-
- o Some of the size is understandable because I happen to
- have a rather large ~/.Xresources file that gets placed on
- the root window's RESOURCE_MANAGER property and must be
- downloaded to every client. This is about 4.4 kilobytes.
-
- 2) Notice the 31 NoExposure events. These are generated by CopyArea
- or CopyPlane requests to indicate that the copy was successful
- (if your copy operation involved obscured regions of windows,
- Graphics Expose events tell you the regions unavaiable for
- copying).
-
- For iconbook, all the CopyAreas done from pixmaps to windows.
- Pixmaps never are obscured so the NoExposure can always be
- assumed. You can configure the graphics context used with
- CopyArea to not generate GraphicsExpose and NoExposure events
- altogether. This should be done when you _know_ the source
- drawable is a pixmap.
-
- Generating useless events forces your client to wake up
- unnecessarily. A bad example of this is fm (used to?) generate
- its blinking cursor using CopyArea from a pixmap. Everytime the
- cursor blinked, the app would later be woken up again to get the
- unnnecessary NoExposure event. So if the cursor needed to blink
- once a second, the app would have to be scheduled twice that
- often.
-
- 3) Notice the 49 InternAtom requests. Each is a round-trip to the
- server. Realize that a round-trip requites the client to write a
- request, be descheduled, the X server to be scheduled, the X server to
- read, and then process the request, write the reply, be
- descheduled, and reschedule the client so it can read the reply.
-
- We can only do *maximum* 600 to 800 round-trips per second.
- Requests that don't need replies can be batched allowing up to
- 200,000 requests to be done a second. Consider 20,000 a good
- guess. That a two-order of magnitude hit for doing an X protocol
- round-trip.
-
- If you examine the trace to find the atoms that are interned, 14
- atoms are interned for the SoundScheme protocol (though at most 3
- will probably ever be used by iconbook). The desks support
- requires 5 atoms to be interned (even if desks aren't in use!).
- Two atoms are interned for using SGI's help protocol even though
- help hasn't even been requested yet.
-
- These 21 InternAtom's represent an (estimated) 1/35 second delay
- in start up most toto programs.
-
- 4) The 170 QueryColors requests seem suspect.
-
- The request maps a set of pixel values to RGB triples. Normally
- this request would be a seldom used request since most clients
- want to map an RGB triple or color name to a pixel value and not
- vice versa. But due to a performance bug in the Buffy Motif,
- QueryColors creates tons of round-trips.
-
- Additionally, the request allows multiple pixel values to be
- mapped to RGB triples but most of the time, QuerColor is sent
- with a single pixel value. Quite inefficient.
-
- 5) The GetGeometry request returns the height, width, and depth of
- an X drawable (pixmap or window). It is called 30 times (each call
- generating a round trip). If you examine the protocol trace,
- GetGeometry is called of only a few requests after the request
- that created the drawable. One must wonder why the height,
- width, and depth for a drawable must be retrieved when those
- parameters had to specified by the program moments before. The
- answer is a performance bug in Motif.
-
- Usually GetGeometry and GetWindowAttributes rarely need to be
- called. For windows your X client created, it should already
- know the information. Most of the 20 GetWindowAttributes
- requests appear to be unnecessary.
-
- These are the type of insights that are motivated mostly from the
- results of xprotostats. [BUGs are filed on the problems discussed
- above; many of the problems have been fixed though we are still waiting
- for the fixes to make it into "an alpha near you."]
-
- Another use of xprotostats is to see how different alphas affect the
- number of X requests your application does. If a new alpha comes out,
- you'll be able to quickly detect and report a performance regression in
- Motif or Xt or Xlib that has affected your program.
-
- xprotostats only gets you started and pointed in the right direction.
- The real insights come from reading the trace and understanding what's
- going on. Here are somethings to look for:
-
- 1) Duplicated requests, particularly requests that are querying
- information.
-
- 2) Rendering to windows that have not yet been mapped.
-
- 3) Pixmaps being created and freed without ever being rendered too.
-
- 4) Round-trip requests that are returning information about objects
- the client created (and should already know the information
- for).
-
- 5) Mapping already mapped windows; unmapping already unmapped
- windows.
-
- 6) Interning of dubious atoms.
-
- 7) Repeatedly calling ClearArea on a cleared window.
-
- 8) Querying and opening already open fonts.
-
- 9) Creating graphics contexts and destroying them without ever using
- them
-
- Digging deeper: once you've isolated a botched protocol sequence, you need
- to associate the protocol event with what the program is doing.
-
- This actually turns out to be difficult but not impossible. The approach
- I use is to run the program in dbx and also trace the protocol stream using
- xscope. If you are looking for problems during startup, that helps since
- the startup will be mostly deterministic. Look for landmark Xlib requests
- before the sequence you are interested in (things like OpenFont are fairly
- rare and good canidates for landmarks). Then set dbx breakpoints in the
- landmark Xlib routines. Once you get to the landmark, then advance
- carefully by setting breakpoints in routines you expect to hit. You can
- use the xscope output to follow your progress. When you finally get to
- the right place, type "where" in dbx and find out what's going on.
-
- Another good source of landmarks is to use "par -s -SS -p <pid>" on the
- dbx'ed process to get system call landmarks.
-
- You'll definitely be in Xlib and probably deep in your toolkit or
- widget set. From the complete stack trace, you should be able to get a
- good guess what's going on. At this point refer to the source code
- for Xlib and Xt and Motif and figure out what's going on.
-
- The following dbx commands will help you correlate the routines with
- their source code:
-
- for Xlib: directory /hosts/bonnie.wpd/proj/sherwood/isms/x/mit/lib/X
- for Xt: directory /hosts/bonnie.wpd/proj/sherwood/isms/x/mit/lib/Xt
- for Motif: directory /hosts/bonnie.wpd/proj/sherwood/isms/motif/src/lib/XmRoxy
- for ViewKit: directory /hosts/bonnie.wpd/proj/sherwood/isms/vk/vk/src/base
- directory /hosts/bonnie.wpd/proj/sherwood/isms/vk/vk/src/components
-
- Using dbx's "up", "down", and "edit" commands you should be able to figure out
- what's going on. Experience does help.
-
- Future work: I'd like to write some more sophisticated awk scripts that can
- actually look for certain botched sequences. Other helpful scripts might allow
- comparisons between two protocol traces so improvements and/or regressions can
- be noticed. Another helpful thing would be the ability to insert tracing
- information into the X protocol stream using uninterpreted data section of
- the X NoOperation request. Someone else might consider a CASE tool that
- worked by using a special Xlib that saved the protocol right in the library
- (meaning no psuedo-server was necessary). This would also make it much easier
- to tie X protocol to your program code.
-
- I really hope engineers take this information seriously and make an effort to
- understand what their X clients are doing.
-
- - Mark Kilgard
- May, 1993
- mjk@sgi.com
-
-